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ABSTRACT 

Monte Carlo methods, such as Markov chain Monte Carlo (MCMC) algorithms, have become very popular in signal process¬ 
ing over the last years. In this work, we introduce a novel MCMC scheme where parallel MCMC chains interact, adapting 
cooperatively the parameters of their proposal functions. Furthermore, the novel algorithm distributes the computational effort 
adaptively, rewarding the chains which are providing better performance and, possibly even stopping other ones. These extinct 
chains can be reactivated if the algorithm considers necessary. Numerical simulations shows the benefits of the novel scheme. 

Index Terms — Interacting Parallel MCMC, Adaptive MCMC, cooperative adaptation. 


1. INTRODUCTION 


Markov Chain Monte Carlo (MCMC) algorithms [ Robert a nd Case lla} 2004| are widely employed in signal processing and 
communications for Bayesian inference and optimization I Doucet and Wang 2005 Fitzgerald 2001 Jasra et ak] 2007| Luengo 
and Martino 2013 Martino et al. 2015a| . They draw random samples from a complicated multidimensional target probability 
density function (pdf), 7r(x) with x € 22 C M", generating a Markov chain which converges to 7r(x). The performance, i.e., 
the speed of the convergence, depends strictly on the choice of a suitable proposal function q{x.), and more specifically, on the 
discrepancy between q{x) and 7r(x). 

Speeding up the convergence has motivated an intense research activity. One active research line considers the design of 
an adaptive proposal density within MCMC techniques. Several schemes have been developed in order to tune online the 


parameters of the proposal density, learning them from the previously generated samples jCraiu et al. 2009 Haario et al. 


[2001 1 [Luengo and Martino| [2013[ [Martino et al.||2015a) . On the other hand, the use of parallel chains instead of a single long 
chain has been studied for different reasons. First of all, employing parallel chains allows the use of different proposal pdfs. 
Moreover, another important motivation is the interest in the implementation of MCMC techniques within a parallel architecture 
jCalderhead 2014 Jacob et ahj 2011 Martino et al. 2015b|. Fi nally, a third reason is to speed up the exploration of the state 
space jCorander et al. 2008 Jasra et al. 2007 Martino et al.j 2014 20]Acjb . Several works in the literature are focused 


on producing an interaction among the different parallel chains Casarin et al. 2013| [Craiu et al.[ [2009[ [Martino et al.[ [2014 
[2015c|d| . The exchange of information among the chains can yield jointly two related benefits: produce a faster convergence 
of the chains to the target (e.g., reallocating “lost” chains around a mode of the target [Martino et al. |2015b| ), and help the 
cooperative exploration of the state space (for instance, generating a repulsion among the chains during a certain number of 
iterations [Martino et al. 2015c| ). As a consequence, the combined use of interacting parallel chains and adaptive proposal 
pdfs is considered of great interest in the literature | Craiu et al. [2009 1 . 

In this work, we propose a novel scheme, called parallel adaptive independent Metropolis (PAIM), involving parallel 
MCMC chains which exchange information in order to adapt online their proposal densities. Namely, the interaction among the 
chains is carried out by a cooperative adaptation of the proposal functions. Each MCMC employs a proposal pdf, independent 
from the previous state of the chain. Each proposal is formed by a mixture of two densities, each one determined by two 


parameters, a mean vector and covariance matrix. All the parameters are updated using empirical estimators (as in | Haario 
[et al.[[20OT| [Luengo and Martino[[2013|) applied to the complete set or only to a subset of the previously generated states. The 
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first component of each mixture aims to provide a global adaptation, the complete set of states is used. The second component 
of each mixture is adapted considering only a subset of the previously generated states in order to learn local features of the 
target pdf. These subsets of states are built using a simple clustering-type strategy. This generates a cooperative adaptation, 
preventing that different proposals cover the same region, and allowing them to cover different modes of the target function, 
for instance. Furthermore, the novel algorithm is able to identify the proposal functions better located, and to allocate more 
computational effort into the corresponding chains. Indeed, PAIM also adapts the number of iterations of every chain, stopping 
those chains which are using a bad-located proposal. Moreover, PAIM is also able to turn back on certain chains when it is 
necessary. 


2. GENERAL SETUP 


In many different applications [Corander et al.[ 2008 [Fitzgerald [2001 Jasra et al. |2007[ , it is necessary to draw samples from 
a complicated d-dimensional target probability density function (pdf), 7f(x) a 7r(x), with x G T) C For this purpose, 
we consider the use of N parallel Metropolis-Hastings chains [ Robert and CaselTal |2004 |, each one employing a different 
independent proposal pdf '0„(x), defined as a mixture of two pdfs gi „(x) and q 2 ^n{^)- More specifically, explicitly indicating 
also the parameters of the two components, we have 


lA (x) 


1 




with X G T), n = 1,..., N, where represents a d-dimensional mean vector, and „ is a d x d covariance matrix, for 
i G {1, 2}. In the novel method, the iV parallel chains interact in order to jointly all the parameters for i = 1,2 

and Vn. The parameters /ri „ and Ci „, of the first component gi „ of every proposal are updated for providing a global 
adaptation, whereas the parameters /r 2 ,n and C 2 ,Tt of g 2 ,n are adapted in order to extract local features of the target pdf. 

Furthermore, in the new scheme, each chain is performed for a different number of iterations, Ar„, for n = 1,... ,N. Indeed, 
the proposed algorithm is also able to determine online a subset of the N chains which are obtaining the better performance. 
These chains are employed more often than others, namely increasing the total number of iterations Kn performed by the 
corresponding chains. Let us denote the total number of desired samples as L, chosen in advance by the user. Then, we have 
K\ + K 2 ... + Kjs! = L, i.e., PAIM is stopped when L samples have been generated]^ 


3. ADAPTIVE PARALLEL INDEPENDENT METROPOLIS ALGORITHM 

The parallel adaptive independent Metropolis (PAIM) algorithm works on two different time scales. First of all, the index t 
denotes the current step of the algorithm, with 

t = 0,..., Ttot, (1) 

where the total number of steps Ttot is not decided by the user, but automatically tuned by PAIM (we have always Ttot A L). 
Furthermore, we have have a different iteration index kn for each chain, such that 

kn=0,...,Kn, (2) 

for n = 1,..., iV. The value of each Kn is also decided by PAIM (recall that ^n=i contains 

the indices corresponding to the active chains. At each f-th step, every active chain performs one iteration (i.e., if the j-th chain 
is active, then kj = kj + 1), whereas the inactive chains remain frozen. For every t < Ttmin^ where Ttrain is chosen by the 
user, all the chains are active, i.e, 

At = {l,2,...,N}, t< Ttrain- 

Whereas, for t > Ttrain, we have At Q {1, 2,..., TV}. The interacting adaptation is performed at any t such that Ttrain < 
t < Tstop- We consider the possibility of stopping the adaptation after Tgtop steps, since the adaptation could jeopardize the 
ergodicity of the chains. However, the numerical results described in Section]^ show that the algorithm seems to maintain the 
correct ergodicity properties. 

The adaptation is performed as follows. During the first Ttrain time steps the algorithm simply assigns the new current 
states generated at the f-th step, to one chain among the N possible, according to the minimum Euclidean distance between 
them and the means for n = 1 ,..., TV. Thus, we allow the method to use a few iterations (f = 1,..., Ttrain) to collect 

* Note that we are including all the states in the ‘’burn-in” periods of the N chains. However, they can be discarded if desired. 












and C^, 


information about the target, as in |Haario et al. 20011. Afterwards, the algorithm adapts all the parameters 

for i G {1, 2}, and the set of active chains At, until t = Tstop- On the one hand, the parameters and are updated 
using the empirical estimators for the means and covariances considering only the samples assigned to the n-th chain. On the 
other hand, the parameters and cf are adapted considered all the generated states so far. The chains are turned on or off 
taking into account the number of samples assigned to the n-th chain. PAIM is outlined in details below. 


1. Initialization: 


1.1- Indices: set £ = 0, f = —1, kn = 0, for n = 1,..., N. 

1.2- Parameters: choose the number of chains N and the desired samples L. Select the positive values e, Ttrain, ?stop0the 

initial parameters * = 1; 2, the initial states x„ o the counters rn„ = 1 for n = 1,... ,N. Define the 

initial set of the active chains as Ao = {1,2,..., N}. 


2 . MH steps: 

2.1- Set t = t+l, Z = ^ and i = 0. 


2.2- For all the active chains, i.e., for all the indices j G At- 
(a) Sample x' from the j-th proposal pdf. 



i=l 


(b) Accept Xj^kj+i = x' with probability 


a = min 


^ 7r(x')V>j*^(xj,fc^.) 


Otherwise, set Xj= x^-. 

(c) Set Z = Z U jzi+i = Xj_fc^_|_i}, and i = i -I- 1. 

(d) Set 0^+1 = Xj^kj+i, kj = kj + 1 and £ = £ -|- 1. 

(e) Stop condition: If £ > L then stop and return {6i,... ,6^}. 

3. Assignment (if t < Tstop)' 

3.1- For all the vectors G Z, i.e., for r = 1,..., |Z| ; 

(a) Find the closest mean to Zt (w.r.t. the Euclidean distance), i.e., hnd the index 

n* = argmin ll/xl*), — zJ\. 

n ’ 


(3) 


(4) 


(b) Set = Zi and = m„. -f 1. 

4. Adaptation (if Ttrain <t< Tstop)' 

4.1- Set = 0. 

4.2- For n = 1,... ,N, update the mean vectors. 






2 = 1 


( 5 ) 


^We recall that it is necessary to set Ttrain < Tstop- 










update the covariance matrices, 


'^l,n 
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(t+1) 

2,n 




Z=1 


mn. - 1 


— + ei, 


i=l 


where 1^ is d x d unit matrix. Moreover, if 


(6) 




N 


rrir. 




> 0 , 


(7) 


then set At+i = At+i U {n} (where [a] denotes the smallest integer not smaller than a, with a G K). Otherwise, if 
Qn = 0 the n-th chain is deactivated, i.e., not used at the next iteration. 


5. Repeat from step 2. 


4. FURTHER CONSIDERATIONS ABOUT PAIM 

Observe that the set Z contains all the new states generated in one specihc step. This is refreshed at the beginning of every step 
t. Since the number of active chains is variable, the cardinality of Z is also changing with t. Moreover, we have denoted with 
s„ i the i-th state assigned to the n-th chain. The counter m„ indicates the number of states associated to the n-th chain. Note 
also that for all t > Tt^ain we have 


r*l,n M 5 


CXi = &\ for n = l,...,N. 


Namely, all the functions „ are updated using the empirical estimators of the mean and covariance of the target obtained from 
all the generated samples. 

It is important to remark that PAIM is able to distribute the computational efforts efficiently. Indeed, let us consider the 
initial use of a huge number N of parallel chains, with proposal pdfs localized randomly over the state space. All the chains 
using a bad-localized proposal pdf would be quickly deactivated, i.e., only the chains with proposal pdfs located close to high 
probability regions would survive. However, the algorithm is also able to start up again certain chains if, after some steps, new 
states have been assigned to them. Finally note that, in the description of the algorithm, the parameters are updated using a 
block procedure, but efficient recursive update formulas can be employed (e.g., see | Luengo and Martino 2013| ), so that PAIM 
can be efficiently applied in high dimensional problems. 


5. NUMERICAL RESULTS 


We consider a bi-dimensional “banana-shaped” target distribution [Haario et al. 2001} , which is a benchmark function in the 
literature due to its nonlinear nature. Mathematically, it is expressed as 


t:{xi,X2) oc exp 




x\ x\ \ 


where, we have set B — 10, rji = 4, 772 = 5, and 773 = 5. The goal is to estimate the expected value f7[X], where X = 
[Xi, A 2 ] Tt{xi,X 2 )- We approximately compute the true value i7[X] « [—0.4845,0]^ using an exhaustive deterministic 
numerical method (with an extremely thin grid), in order to obtain the mean square error (MSE) of PAIM and the corresponding 
independent parallel MH chains with the same initial parameters but without adaptation. 

We consider N G {5,10, 50,100} chains with Gaussian proposals * = 1)2 and n = I,... ,N. The 

initial means as well as the initial states are chosen randomly at each run. More specifically, we have x„ 0 ~ ^([~15) ~15] x 
[—15,15]) and /jif'l ~ —15] x [—15,15]). The initial covariance matrices are 0; 0 with a = 10. 

We consider L = 5000 total number of samples. For PAIM, we test Ttrain G {1) 10, 20} and set e = 0.4, T^top = 00 (i.e., we 
never stop the adaptation). 


















Fig. 1. (a) Contour plot of the banana-shaped target if, and the initial means (circles) with N = 20. (a) We show the 
generated samples (dots), the initial (circles) and hnal means (x-marks) of the second component 52 ,n of the proposal of the 
final active chains. The final covariance ellipsoids are also depicted. The hnal mean of the hrst component of the proposals, 
, is shown with a square jointly with the cotTesponding covariance ellipsoid with dashed line. 


The results are averaged 500 over independent simulations, for each combination of parameters. PAIM always provides 
a smaller mean square error (MSB) in estimation of i?[X] (averaging the two components) with respect to the corresponding 
independent parallel chains (IPCs). Table [T] shows the percentage of reduction in MSB obtained used PAIM with different N 
and Ttrain- We can observe the advantage of the proposed adaptation procedure. Indeed, in this example, the minimum train 
period {Ttrain = 1). provides the best results. Pigure[2a) shows the initial means of the second components of the proposals in 


Table 1. Percentage of reduction in the MSB obtained using PAIM, with respect to IPCs. 


'Strain 

N = 5 

Af = 10 

TV = 50 

N = 100 

1 

51.34% 

58.05% 

63.78% 

60.31% 

10 

46.95% 

41.73% 

44.23% 

35.65% 

20 

29.81% 

35.51% 

33.29% 

22.33% 


PAIM, i.e., /X 2 °n’ '^he contour plot of the target tt. Bigure lib) depict the initial (circles) and final (x-marks) conhgurations 

/Q) ('T j 

of the means of the second components of the final active proposals, i.e., „ and /i .2 ^ obtained in one specific run (setting 

N = 20, L = 5 10"^). The final mean = p,{Ttop) Qf j-jjg component common to all the proposals is depicted with a 

square. The hgures also show with solid line the covariance ellipsoids, corresponding to the « 90% of the probability mass, of 
the second components of the proposals of the final active chains. The covariance ellipsoid corresponding to the first common 
component of the proposal pdfs is displayed with a dashed line. Bigureshows the indices corresponding to the active chains 
as function of t, in one specific run (N = 50, L = 10^). 


6. CONCLUSIONS 


In this work, we have introduced a new interacting parallel MCMC scheme, where a cooperative adaptation of the proposal 
densities is performed. Burthermore, the computational effort is efficiently distributed by the the novel method among the set 
of parallel chains, according to their performance. 
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